Part I: R
Introduction
What is
R?
R is a programming language used for
statistical analysis and graphics. It is based on S-plus, which itself
was based on S, a programming language originally developed by
AT&T.
Why
R?
- Open source, cross-platform, and free
- Great for reproducibility
- Interdisciplinary and extensible
- Tons of learning resources
- Works on data of all shapes and sizes
- Produces high-quality graphics
- Large and welcoming community
R:
Object-Oriented Programming
Unlike many other statistical software such as SAS and SPSS,
R will not spit out a mountain of output on the screen.
Instead, R returns an object containing
all the results. You, as an user, have the flexibility to choose which
result to be extracted or reported.
R:
Functional Programming
This feature allows us to write faster yet more compact code. For
example, a common theme in R programming is
avoidance of explicit iteration. Unlike many other
statistical softwares, explicit loops are discouraged.
Instead, R provides some functions that could allow us
to express iterative behavior implicitly.
R:
Polymorphic
R is also polymorphic, which means that a
single function can be applied to different types of inputs (much more
user friendly).
Such a function is called a generic function (If you are a
C++ programmer, you have seen a similar concept in virtual
functions).
Polymorphic -
Example
Lets look at one example plot()
- Plot a vector of numbers
- Plot some model results
No matter which purpose, we use the same function.
dats <- c(1,2,3,4)
plot(dats)

# Regression Analysis
par(mfrow=c(2,2),mar=c(2,4,2,2))
results <- lm(speed ~ dist,data=cars)
plot(results)

Why R Studio?
R Interface is ugly!
Many students in this class are much more familiar with Windows
operation system and have never been exposed programming before, so we
will use R studio, one of the free Graphical User Interfaces (GUIs) that
have been developed for R.
R studio should really be considered as integrated
development environments (IDEs), since it is aimed more toward
programming.
Easy publishing of reproducible documents such as reports,
interactive visualizations, presentations, and websites.
R Studio: A short
tour
Initial Start
When you first (like very first time) open R studio you will see
three panels.

Console

- Every time you launch RStudio, it will have the same text at the top
of the console telling you the version of R that you’re running.
- Below that information is the prompt,
> . As its
name suggests, this prompt is really a request, a request for a
command.
- Initially, interacting with R is all about typing commands and
interpreting the output.
- These commands and their syntax have evolved over decades
(literally) and now provide what many users feel is a fairly natural way
to access data and organize, describe, and invoke statistical
computations.
The console is where you type commands and have them immediately
performed.
Environment
The panel in the upper right contains your workspace (aka
Environment)

- This shows you a list of objects/variables that R has saved.
- For example here a value of
3 has been assigned to the
object a.
History
Up here there is an additional tab to see the history of the commands
that you’ve previously entered.

Files
The files tab allows you to open code/script files within R
studio.

Plots
Any plots that you generate will show up in the panel in the lower
right corner.

Help
To check the syntax of any function in R, type ? in front of the
function name to pull up the help file.

For example here I typed ?mean to get the help file for
the mean function. The help files are not always the most useful but are
usually a good place to start.
Script File The top left is your editor window,
where you write code or script, the console is now at the bottom.
I usually change it

The picture above illustrates my preferred style in R Studio.
R Script
Most of R users typically submit commands to
R by typing either in console or editor panel, rather than
clicking a mouse in a Graphical User Interface (GUI).
In class, we will make extensive use of scripts. A
Script is nothing but a collection of commands
and procedures that the coder performed to get to their results and
conclusions..
There are at least two advantages of doing so:
- As explained earlier, this allows us to run a bunch of results
altogether by putting a collection of commands in a file.
- It is also a lot more transparent and straightforward to
share and replicate what you have
done.
This will always be our approach in this class!!!
RMarkdown
Your assignments will use a slightly different approach as you will
be required to produce a .pdf file via an .Rmd
file. Think of this as an interactive version of the R Script. Here, we
are able to embed our codes and results with little effort. RMarkdown is
focused on reproducibility and less on the static codes presented in
your R scripts.
The next lesson covers RMarkdown files and its flexibility.
Exercise
Task
1: Create a script file
- Open R Studio and go to
File > New > R Script.
This will open a blank text document.
- Two alternative ways are:
- CTRL + SHIFT + N or
- press the button marked “+”, just below File, and select R
Script
- In the document, type
x = 5 # Assign the variable x a value of 5
x == 5 # Does x = 5? Notice the double ==
Highlight both lines of code and click the button marked “Run”.
If everything is working correctly, the console should display
TRUE.
OR, pressing CTRL + ENTER or COMMAND +
RETURN depending on whether you’re running Mac OSX, Linux or
Windows.
- Go to “File > Save As”, and choose a file name.
Part III:
R Data types
Data types in
R
You have observed a few of the different data types in the earlier
sections. Here, we will formally discuss them. Some of the most basic
data types we will cover are:
- Decimal values like 4.5 are called numerics.
- Natural numbers like 4 are called integers.
Integers are also numerics.
- Boolean values (TRUE or FALSE) are also called
logical.
- Text (or string) values are called characters.
You can check the type of data by using class().
x <- "Lyrics to Virginia Tech Fight Song!"
class(x)
[1] "character"
x2 <- c("TRUE", "FALSE")
x2 <- as.logical(x2) #Declare the data type
x2
[1] TRUE FALSE
[1] "logical"
x <- 1:20
x %% 4 #x mod 4
[1] 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0
[1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
[12] TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE
[1] "logical"
Vectors
A vector is the most common and basic data type in R, and is pretty
much the workhorse of R. A vector is characterized by a series of
values, which can be either numbers or characters. We can assign a
series of values to a vector using the c() function.
In short, vectors are most useful when we have a collection of data
points.
Here c() stands for concatenate or
combine.
[1] 1 2 3 4
[1] 1 2 3 4
(v <- seq(from = 0, to = 0.5, by = 0.1))
[1] 0.0 0.1 0.2 0.3 0.4 0.5
#A vector can also contain characters:
(v_colors <- c("blue", "yellow", "light green") )
[1] "blue" "yellow" "light green"
Notice that by encasing the beginning and end of the assignment
lines in parentheses, we immediately print the stored values.
Subsetting vectors
(Indexing/reassigning elements)
We are able to index (collect subsets of our variables) by using
squared brackets. Unlike python, for example, R’s indexing begins from
1.
v_colors[2] # We are trying to extract the second element of the vector, v_colors
[1] "yellow"
v_colors[c(1,3)] # We can use the concatenation function to get nonconsecutive elements. Here, we are trying to extract elements in positions 1 and 3.
[1] "blue" "light green"
How would your extract elements 1:9, 15, 19, 20 and 21:30 in
zz below?
set.seed(1234)
zz <- rnorm(100)
Answer:
zz[c(1:19,15, 19, 20:30)]
[1] -1.20706575 0.27742924 1.08444118 -2.34569770 0.42912469
[6] 0.50605589 -0.57473996 -0.54663186 -0.56445200 -0.89003783
[11] -0.47719270 -0.99838644 -0.77625389 0.06445882 0.95949406
[16] -0.11028549 -0.51100951 -0.91119542 -0.83717168 0.95949406
[21] -0.83717168 2.41583518 0.13408822 -0.49068590 -0.44054787
[26] 0.45958944 -0.69372025 -1.44820491 0.57475572 -1.02365572
[31] -0.01513830 -0.93594860
We can replace elements in specific positions. Below, we replace the
second and third colors with red and
purple.
(v_colors[2:3] <- c("red", "purple") )
[1] "red" "purple"
Sometimes it might be more convenient to get rid of particular
elements instead. For example, I might want to extract all
but the first 5 elements of a vector, or all but the
15th element. We might find it easier to use a negative index here.
[1] 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# We could have done that in one go as well
x[-c(1:3)]
[1] 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Conditional
subsetting
Another common way to subset is by using a logical vector.
TRUE will select the element with the same index, while
FALSE will not. Typically, these logical vectors are not
typed by hand, but are the output of other functions or logical tests
such as:
[1] 100 101 102 103 104 105 106 107 108 109 110
x >105 # returns TRUE or FALSE depending on which elements that meet the condition
[1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
select <- x > 105
x[select]
[1] 106 107 108 109 110
If we would like the elements that evaluate to FALSE
instead, we could easily use the ! (NOT)
operator
[1] 100 101 102 103 104 105
You can combine multiple tests using:
We can test whether x is between the range 103 and 106:
[1] 103 104 105 106
x is greater than 103 but (AND) less than or equal to
106
x[x <= 106 & x > 103] # order of subsetting does not matter here!
[1] 104 105 106
x is less than 103 or greater than 106
[1] 100 101 102 106 107 108 109 110
Sometimes we will need to search for certain strings in a vector.
With multiple conditions, it becomes difficult to use the “OR” operator
|. The function %in% allows you to test if any
of the elements of a search vector are found:
animals <- c("mouse", "rat", "dog", "cat")
animals[animals == "cat" | animals == "rat"] # returns both rat and cat
[1] "rat" "cat"
animals %in% c("rat", "cat", "dog", "duck", "goat")
[1] FALSE TRUE TRUE TRUE
animals[animals %in% c("rat", "cat", "dog", "duck", "goat")]
[1] "rat" "dog" "cat"
Names of a
vector
Let’s say that we want to know which color robe each of 3 patients is
wearing, we can assign names to the vector of colors.
[1] "blue" "red" "purple"
names(v_colors) <- c("Thomas", "Liz", "Tucker")
v_colors
Thomas Liz Tucker
"blue" "red" "purple"
Algebraic Operations
of Vectors
x <- c(1,2,3)
y <- c(4,5,6)
# component-wise addition
x+y
[1] 5 7 9
# component-wise multiplication
x*y
[1] 4 10 18
# What happens to the following
y^x # or y**x
[1] 4 25 216
Repeating Vector in
R
# Would this work?
c(1,2,3,4) + c(1,2)
[1] 2 4 4 6
# Would this work?
c(1,2,3) + c(1,2)
Warning: longer object length is not a multiple of shorter object length
[1] 2 4 4
Why the weird results?
- When you are adding vectors of unequal size, if the long one is a
multiple of the short one,
R automatically repeats the
short one to fill in the operation.
[1] 2 4 6
Matrix
Create a new matrix
(matrix<-matrix(1:16, nrow = 4, byrow = TRUE))
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
[4,] 13 14 15 16
Note that : means every number from 1 to 4. In the
matrix() function:
- The first argument is the collection of elements that
R
will arrange into the rows and columns of the matrix. Here, we use 1:16
which is a shortcut for c(1, 2, 3, 4, … 16).
- The argument
byrow indicates that the matrix is filled
by the rows. If we want the matrix to be filled by the columns, we use
byrow = FALSE.
- The argument
nrow indicates that the matrix should have
4 rows.
Selection of Matrix
Elements
Selection of the matrix elements are similar to vectors except we
have two dimensions over which to subset- rows and columns.
# matrix[r,c] #Standard form of the matrix.
matrix[1,2] #Extract element in the first row and second column
[1] 2
#Extract the entire first row and second columns
matrix[,1:2]
[,1] [,2]
[1,] 1 2
[2,] 5 6
[3,] 9 10
[4,] 13 14
Assign dimension
names to Matrix
rownames(matrix) <- c("Yes", "No", "Perhaps", "Maybe")
colnames(matrix) <- c("Apple", "Pear", "Banana", "Grapes")
matrix
Apple Pear Banana Grapes
Yes 1 2 3 4
No 5 6 7 8
Perhaps 9 10 11 12
Maybe 13 14 15 16
Dimension of a
matrix vs vector
x <- c(1,2,3)
matrix<-matrix(1:4, byrow = TRUE, nrow = 2)
length(x)
[1] 3
[1] 4
[1] 2 2
NULL
Lists
R doesn’t like vectors to have different types:
c(TRUE, 1, "Frank") becomes
c("TRUE", "1", "Frank"). But storing objects with different
types is absolutely fundamental to data analysis. R has a
different type of object besides a vector used to store data of
different types side-by-side: a list:
[1] "TRUE" "1" "Frank"
x <- list(TRUE, 1, "Frank")
x
[[1]]
[1] TRUE
[[2]]
[1] 1
[[3]]
[1] "Frank"
Many different things not necessarily of same length can be put
together.
x <- list(c(1:5), c("a", "b","c"), c(TRUE, FALSE), c(5L, 6L))
x
[[1]]
[1] 1 2 3 4 5
[[2]]
[1] "a" "b" "c"
[[3]]
[1] TRUE FALSE
[[4]]
[1] 5 6
Dataframes
- Data frames are like spreadsheet data, rectangular with rows and
columns.
- Ideally each row represents data on a single observation and each
column contains data on a single variable, or characteristic, of the
observation.
- It represents the data in a tabular format where the columns are
vectors that all have the same length. Because columns are vectors, each
column must contain a single type of data (e.g., characters, integers,
factors).
- We can open a data viewer window to see the contents of
R’s iris data frame by typing.
- We will be working with spreadsheets a lot.
Create a data
frame
Data frame with Harry Potter characters
name <- c("Harry", "Ron", "Hermione", "Hagrid", "Voldemort")
height <- c(176, 175, 167, 230, 180)
gpa <- c(3.4, 2.8, 4.0, 2.2, 3.4)
df_students <- data.frame(name, height, gpa)
df_students
Alternative way of creating DF
df_students <- data.frame(name = c("Harry", "Ron", "Hermione", "Hagrid", "Voldemort"),
height = c(176, 175, 167, 230, 180),
gpa = c(3.4, 2.8, 4.0, 2.2, 3.4))
df_students
Adding variable
df_students$good <- c(1, 1, 1, 1, 0)
df_students
Features of the
DF
dim(df_students)
df_students[2, 3] #Ron's GPA
df_students$gpa[2] #Ron's GPA
df_students[5, ] #get row 5
df_students[3:5, ] #get rows 3-5
df_students[, 2] #get column 2 (height)
df_students$height #get column 2 (height)
df_students[, 1:3] #get columns 1-3
df_students[4, 2] <- 255 #reassign Hagrid's height
df_students$height[4] <- 255 #same thing as above
df_students
Exercise
Now that you are equipped with the basic, go ahead and take the
following Datacamp Course, R
Intro on Datacamp. Your invitations should now be in your inbox.
Part IV: Working
directories
You can use the
command to obtain the current directory R is using.
It is good practice to set the working directory location to where
the files and data are stored.
- Consider setting your working directory to a folder called AAEC4984,
AAEC5484, or STAT5484 on your desktop (for example).
Creating Directory
and Set working directory
Windows
setwd("C:/users/[your user name]/Desktop/AAEC4984/")
# OR
setwd("C:\\users\\[your user name]\\Desktop\\AAEC4984\\")
# notice the double backslashes
Mac
setwd("~/Desktop/AAEC4984")
- To check whether the wd is correct, we again use
- To obtain a list of the names of files or folders in the working
directory, we can use
- To create a new folder in your directory we can use
dir.create("[Folder name]")
Importing data
R allows us to import several file types. I will discuss
3 that we are most likely to use in this course.
- Text files
-
Data sometimes come with headers (the first row is variable names, not
actual data!) You need to tell R that!
textdata<-read.table("examples/hogsdata.txt",header=T)
- CSV files :
- xlsx files (requires openxlsx package)
xlsxdata <- read.csv("examples/hogsdata.xlsx", ... )
Functions &
Packages
Functions are “canned scripts” that automate more complicated sets of
commands including operations assignments, etc. For the purpose of this
course, we will use a lot of functions that are built both in base R
(that is, they are predfined) or available through R packages (discuss
below).
A function usually takes one or more inputs called
arguments, and often (but not always) return a
value.
Consider for example, taking the average of a set of random numbers
(x).
set.seed(124)
x <- rnorm(6) * 100
(round(x, digits=2)) # round function => 2dp
[1] -138.51 3.83 -76.30 21.23 142.55 74.45
If we were to do this manually, we would:
- Sum up the values
- Get the number of observations
- Divide sum by total number of observations
Using R’s built in mean function we can do
all three steps internally and cross check against our manual
calculations.
[1] 4.542439
meanx == mean(x) # cross validation
[1] TRUE
Installing
Packages
Since R is an Open Source software program, thousands of
people contribute to the software. They do this by writing commands
(called functions) to make a particular analysis easier, or to make a
graphic prettier.
When you download R, you get access to a lot of
functions that we will use. However the other user-written packages we
use for our analyses will make our lives much easier.
For example, though we can use the plot command for
standard graphics, you will quickly see that we can get much better
looking time series graphs using the fpp3 package (which
also uses, among other packages,ggplot2).
Installing
Packages
To install the fpp2 package, we can use the command
We will need to install a package only once in R.
Now that you have the fpp3 package installed, we can
check to see if it is in use
Lastly, in order to use the package, we will need to load the
library
Using libraries
The fpp3 package contains a number of useful datasets.
One such data set is us_gasoline (Weekly US finished motor
gasoline product supplied).
Use the help() function to get a description of this
data. Try
Now let us create a nice plot of the us_gasoline
data.
autoplot(us_gasoline, col = "darkgreen") +
#Now to add axis labels
labs(title = "Weekly US finished motor gasoline product supplied",
y = "Mbd", caption = "Source: fpp3 package") +
theme_light() # One of many themes in R

Let us leave it there for now!
---
title: "Short Introduction to R"
subtitle: 'Tutorial on R Studio'
# logo: ../images/rstudio-start.png
author: Applied Economic Forecasting
#institute: |
#  | Department of Agricultural & Applied Economics
#  | Virginia Tech
# date: "1/18/2021"
output: 
  html_notebook:
    theme: journal
    highlight: zenburn
    toc: yes
    toc_depth: 3
    toc_float: yes
    number_sections: true
    fonttheme: "serif"
    code_folding: show
---

```{r setup, echo=FALSE, include=FALSE, warning=FALSE} 
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
require(fpp3)
theme_set(theme_bw())
```

# Part I: `R` Introduction

## What is `R`?

`R` is a **programming language** used for statistical analysis and graphics. It is based on S-plus, which itself was based on S, a programming language originally developed by AT&T. 

## Why `R`?

- Open source, cross-platform, and **free**
- Great for reproducibility
- Interdisciplinary and extensible
- Tons of learning resources
- Works on data of all shapes and sizes
- Produces high-quality graphics
- Large and welcoming community


## `R`: Object-Oriented Programming

Unlike many other statistical software such as SAS and SPSS, `R` will not spit out a mountain of output on the screen.

Instead, `R` returns an **object** containing all the results. You, as an user, have the flexibility to choose which result to be extracted or reported.


## `R`: Functional Programming

This feature allows us to write faster yet more compact code. For example, a common theme in `R` programming is **avoidance of explicit iteration**. Unlike many other statistical softwares, explicit loops are discouraged.


Instead, `R` provides some functions that could allow us to express iterative behavior implicitly.


## `R`: Polymorphic

`R` is also *polymorphic*, which means that a single function can be applied to different types of inputs (much more user friendly).

Such a function is called a *generic function* (If you are a C++ programmer, you have seen a similar concept in *virtual functions*).

## Polymorphic - Example

Lets look at one example `plot()`

1. Plot a vector of numbers
2. Plot some model results

No matter which purpose, we use the same function.

```{r, echo=TRUE, fig.height=4}
dats <- c(1,2,3,4)
plot(dats)
```


```{r, echo=TRUE, fig.show="hold", fig.height=5}
# Regression Analysis
par(mfrow=c(2,2),mar=c(2,4,2,2))
results <- lm(speed ~ dist,data=cars)
plot(results)
```


## Why R Studio?

- `R` Interface is **ugly**!


- Many students in this class are much more familiar with Windows operation system and have never been exposed programming before, so we will use R studio, one of the free Graphical User Interfaces (GUIs) that have been developed for `R`.

- R studio should really be considered as *integrated development environments* (IDEs), since it is aimed more toward programming.

- Easy publishing of reproducible documents such as reports, interactive visualizations, presentations, and websites.

## R Studio: A short tour

**Initial Start**

When you first (like very first time) open R studio you will see three panels.


![](../images/rstudio-start.png)

---

**Console**

![](../images/rstudio-console.png){scale=30%}

1. Every time you launch RStudio, it will have the same text at the top of the console telling you the version of R that you're running.
2. Below that information is the prompt, `>` . As its name suggests, this prompt is really a request, a request for a command.
3. Initially, interacting with R is all about typing commands and interpreting the output.
4. These commands and their syntax have evolved over decades (literally) and now provide what many users feel is a fairly natural way to access data and organize, describe, and invoke statistical computations.


The console is where you type commands and have them immediately performed.

---

**Environment**

The panel in the upper right contains your workspace (aka Environment)

![](../images/rstudio-env.png)

1. This shows you a list of objects/variables that R has saved.
2. For example here a value of `3` has been assigned to the object `a`.

---

**History**

Up here there is an additional tab to see the history of the commands that you've previously entered.



![](../images/rstudio-history.png)


---

**Files**

The files tab allows you to open code/script files within R studio.


![](../images/rstudio-files.png)

---

**Plots**

Any plots that you generate will show up in the panel in the lower right corner.


![](../images/rstudio-plot.png)

---

**Help**

To check the syntax of any function in R, type ? in front of the function name to pull up the help file.


![](../images/rstudio-help.png)

For example here I typed `?mean` to get the help file for the mean function. The help files are not always the most useful but are usually a good place to start.

---

**Script File** The top left is your editor window, where you write code or script, the console is now at the bottom. **I usually change it**

![](../images/rstudio-changed.png)

The picture above illustrates my preferred style in R Studio.


## R Script

Most of `R` users typically submit commands to `R` by typing either in console or editor panel, rather than clicking a mouse in a Graphical User Interface (GUI).

In class, we will make extensive use of scripts. A **Script** is nothing but **a collection of commands and procedures that the coder performed to get to their results and conclusions.**.

There are at least two advantages of doing so:

1. As explained earlier, this allows us to run a bunch of results altogether by putting a collection of commands in a file.
2. It is also a lot more transparent and straightforward to **share** and **replicate** what you have done.

<p align="center"> <b> This will always be our approach in this class!!! </b> </p>


## RMarkdown

Your assignments will use a slightly different approach as you will be required to produce a `.pdf` file via an `.Rmd` file. Think of this as an interactive version of the R Script. Here, we are able to embed our codes and results with little effort. 
RMarkdown is focused on reproducibility and less on the static codes presented in your R scripts.

The next lesson covers RMarkdown files and its flexibility.

---

## Exercise

### **Task 1:** Create a script file

1. Open R Studio and go to `File > New > R Script`.

This will open a blank text document.

- Two alternative ways are: 
  - **CTRL + SHIFT + N** or
  - **press the button marked "+", just below File, and select R Script**

2. In the document, type


```{r, eval=FALSE}
x = 5  # Assign the variable x a value of 5
x == 5  # Does x = 5? Notice the double ==
```

- Highlight both lines of code and click the button marked "Run". If everything is working correctly, the console should display `TRUE`.

- OR, pressing **CTRL + ENTER**  or **COMMAND + RETURN** depending on whether you're running Mac OSX, Linux or Windows.

3. Go to "File > Save As", and choose a file name.

---

# Part II: Working with Scripts

## Comments

Whenever possible, use comments! Anything following the symbol `#` in an **R Script** will not be run in R.

Comments are notes we leave ourselves so we know:

  - exactly who wrote the code (important in companies where many people may work on a project)
  - the purpose of the code!
  - what our thought process was at a particular line of code.

I promise that this will become useful when you come back to your code after an extended time. I cannot tell you the number of times I have had a moment of pure genius while coding and I spend hours on a different day trying to understand why I coded it like that or what I actually did.

---

For example, below is the type of comments that I always include in my programs
```
# Project: 'Tutorial on R Studio II'
# Author:  Shamar Stewart
# This program illustrates some basic programming philosophy
# and R operations
```

You can also understand the following code without even knowing what exactly each line of command does because I tell you what they are!

```{r}
# Set seed number so that all the results based on random samples
# are reproducible.
  set.seed(12345)
# Then create a normally distributed random variable, x, with 500
# observations.
  x <- rnorm(500)
# Notice "<-" is the universal assignment operator in R (I prefer this to "=")
```

---

## Exercise
### **Task 2:**

At the top of the previous script (Task 1), add and expand on the following comments:

  1. The project
  2. The author
  3. The purpose of this program

**Follow the example given above.**

---

## R Basics

**Arithmetic**

```{r}

  1 + 1 #add numbers

  8 - 4 #subtract them

  13/2 #divide

  4*pi #multiply (Pi is a built in function in R)

  2^10 #exponentiate
```

---

## Logical Comparison

Logical arguments will result in a value of `TRUE` or `FALSE`.


```{r}
  3 < 4
  3 > 4
  3 == 4
  3 != 4
  10 - 6 == 4

  # Notice the difference between single and double equal signs
```

Now try `3 = 4`. What is the result here?

## Strings (text)

```{r}
#R delimits strings with EITHER double or single quotes.
#There is only a very minimal difference

message1 <- 'Let us get to coding!'
message2 <-  "Please get to coding!"
print(message1)
print(message2)
```

In R, we can also print the result(s) stored in our variables by simply running the running the variable name instead of `print()`.

```{r}
message1
```


## Variables

- variable are used to store values and results. Assignment to a variable happens from right to left - the value on the right side gets assigned to the name on the left side. You can use nearly anything as a variable name in R. The only rules are:

1. "." and "_" are OK to be added to variable names, but no other symbols.

2. Your variable name must not start with a number or "_" (`2squared` and  `_one` are illegal).

- A note for those of you who have programming experience: while R supports object-oriented programming, periods "." do not have a special meaning in the language. For historical reasons, R programmers often use periods in place of underscores in variable names, but either works. Just be consistent to keep your code readable.

- `R` is case sensitive. Capitalization of variable names matter.

```{r}
    x <- 42
    x / 2

    # redefine x
    x <- x + 3
    x

    #if we assign something else to x, the old value is deleted
    x <- "Hokies!"
    x

    foo <- 3
    bar <- 5
    foo.bar <- foo + bar
    foo.bar
```


---

## Exercise

### **Task 3:**

1. Create a variable called `entry` that stores the year you started at Virginia Tech.
2. Store the current year to a variable called `current_t`.
2. Compute the difference between `current_t` and `entry`. Store this as `diffs`.
3. Store your birth year as `my_year`. Now compute the difference between `current_t` and `my_year`. Assign the results to `my_diffs`.
5. Use this information to compute the percentage of your life have spent at this university. Be sure to use brackets if you need them.
6. Assign this result to a variable of your choosing.

---

## Clearing the memory

To remove all variables in memory:

```{r}
#    ls() # List of all variables in memory
    rm(list=ls())
```

- **I usually place this at the beginning of my `R` script (just after the document details).** It is usually a good idea to clear your memory after you've been doing a lot of debugging. This ensures that your codes will work for others and is not dependent of variables you created along the way but didn't actually include as a part of your script.


# Part III: `R` Data types

## Data types in `R`

You have observed a few of the different data types in the earlier sections. Here, we will formally discuss them. Some of the most basic data types we will cover are:

  1. **Decimal values** like 4.5 are called numerics.
  2. **Natural numbers**  like 4 are called integers. Integers are also numerics.
  3. **Boolean values**  (TRUE or FALSE) are also called logical.
  4. **Text**  (or string) values are called characters.


You can check the type of data by using `class()`.

```{r}
  x <- "Lyrics to Virginia Tech Fight Song!"
  class(x)

  x2 <- c("TRUE", "FALSE")
  x2 <- as.logical(x2) #Declare the data type
  x2
  class(x2)

  x <- 1:20
  x %% 4 #x mod 4
  x %% 4 == 0
  class(x %% 4 == 0)
```


## Vectors


A vector is the most common and basic data type in R, and is pretty much the workhorse of R. A vector is characterized by a series of values, which can be either numbers or characters. We can assign a series of values to a vector using the `c()` function.

In short, vectors are most useful when we have a collection of data points.

Here `c()` stands for **concatenate** or **combine**.


```{r}
(v <- c(1, 2, 3, 4))
(v <- 1:4)
(v <- seq(from = 0, to = 0.5, by = 0.1))

#A vector can also contain characters:
(v_colors <- c("blue", "yellow", "light green")	)

```

<b> Notice that by encasing the beginning and end of the assignment lines in parentheses, we immediately print the stored values. </b>

## Subsetting vectors (Indexing/reassigning elements)

We are able to index (collect subsets of our variables) by using squared brackets. Unlike python, for example, R's indexing begins from 1.

```{r}
v_colors[2] # We are trying to extract the second element of the vector, v_colors
v_colors[c(1,3)]  # We can use the concatenation function to get nonconsecutive elements. Here, we are trying to extract elements in positions 1 and 3.

```

How would your extract elements 1:9, 15, 19, 20 and 21:30 in `zz` below?

```{r}
set.seed(1234)
zz <- rnorm(100)
```

**Answer:**

```{r}
zz[c(1:19,15, 19, 20:30)]
```

We can replace elements in specific positions. Below, we replace the second and third colors with `red` and `purple`.

```{r}
(v_colors[2:3]  <- c("red", "purple")	)
```

Sometimes it might be more convenient to get rid of particular elements instead. For example, I might want to extract all **but** the first 5 elements of a vector, or all but the 15th element. We might find it easier to use a negative index here.

```{r}
j <- c(-1,-2,-3)
x[j]

# We could have done that in one go as well
x[-c(1:3)]
```

## Conditional subsetting

Another common way to subset is by using a logical vector. `TRUE` will select the element with the same index, while `FALSE` will not. Typically, these logical vectors are not typed by hand, but are the output of other functions or logical tests such as:

```{r}
x <- 100:110
x

x >105 # returns TRUE or FALSE depending on which elements that meet the condition
select <- x > 105
x[select]
```
If we would like the elements that evaluate to `FALSE` instead, we could easily use the `!` (`NOT`) operator

```{r}
x[!select]
```

---

You can combine multiple tests using:

- `&` (`AND` operator - both conditions are true) or

- `|` (`OR` operator - __at least one__ of the conditions is true)

We can test whether x is between the range 103 and 106:

```{r}
x[x >= 103 & x <= 106]
```

x is greater than 103 but (`AND`) less than or equal to 106

```{r}
x[x <= 106 & x > 103] # order of subsetting does not matter here!

```

x is less than 103 `or` greater than 106
```{r}
x[x >= 106 | x < 103]

```

---

Sometimes we will need to search for certain strings in a vector. With multiple conditions, it becomes difficult to use the "OR" operator `|`. The function `%in%` allows you to test if any of the elements of a search vector are found:

```{r}
animals <- c("mouse", "rat", "dog", "cat")
animals[animals == "cat" | animals == "rat"] # returns both rat and cat

animals %in% c("rat", "cat", "dog", "duck", "goat")
animals[animals %in% c("rat", "cat", "dog", "duck", "goat")]
```


## Names of a vector

Let's say that we want to know which color robe each of 3 patients is wearing, we can assign names to the vector of colors.
```{r, echo=TRUE}
v_colors
names(v_colors) <- c("Thomas", "Liz", "Tucker")
v_colors
```

## Algebraic Operations of Vectors

```{r, echo=TRUE}
x <- c(1,2,3)
y <- c(4,5,6)
# component-wise addition
x+y
# component-wise multiplication
x*y
# What happens to the following
y^x # or y**x
```


## Repeating Vector in `R`

```{r, warning=TRUE, echo=TRUE}
# Would this work?
c(1,2,3,4) + c(1,2)
# Would this work?
c(1,2,3) + c(1,2)
```

*Why the weird results?*

- When you are adding vectors of unequal size, if the long one is a multiple of the short one, `R` automatically repeats the short one to fill in the operation.

```{r, echo = TRUE}
2*c(1,2,3)
```

## Matrix

**Create a new matrix**
```{r}
(matrix<-matrix(1:16, nrow = 4, byrow = TRUE))
```

Note that `:` means every number from 1 to 4. In the `matrix()` function:

1. The first argument is the collection of elements that `R` will arrange into the rows and columns of the matrix. Here, we use 1:16 which is a shortcut for c(1, 2, 3, 4, ... 16).
2. The argument `byrow` indicates that the matrix is filled by the rows. If we want the matrix to be filled by the columns, we use `byrow = FALSE`.
3. The argument `nrow` indicates that the matrix should have 4 rows.

## Selection of Matrix Elements
Selection of the matrix elements are similar to vectors except we have two dimensions over which to subset- rows and columns.

```{r}

# matrix[r,c] #Standard form of the matrix.

matrix[1,2] #Extract element in the first row and second column
#Extract the entire first row and second columns
matrix[,1:2]
```


## Assign dimension names to Matrix

```{r}
	rownames(matrix) <- c("Yes", "No", "Perhaps", "Maybe")
	colnames(matrix) <- c("Apple", "Pear", "Banana", "Grapes")
	matrix
```

## Dimension of a matrix vs vector

```{r}
x <- c(1,2,3)
matrix<-matrix(1:4, byrow = TRUE, nrow = 2)
length(x)
length(matrix)
dim(matrix)
dim(x)
```


## Lists

`R` doesn’t like vectors to have different types: `c(TRUE, 1, "Frank")` becomes `c("TRUE", "1", "Frank")`. But storing objects with different types is absolutely fundamental to data analysis. `R` has a different type of object besides a vector used to store data of different types side-by-side: a list:

```{r}
c(TRUE, 1, "Frank")
x <- list(TRUE, 1, "Frank")
x
```

Many different things not necessarily of same length can be put together.

```{r}
x <- list(c(1:5), c("a", "b","c"), c(TRUE, FALSE), c(5L, 6L))
x
```


## Dataframes

- Data frames are like spreadsheet data, rectangular with rows and columns.
- Ideally each row represents data on a single observation and each column contains data on a single variable, or characteristic, of the observation.
- It represents the data in  a tabular format where the columns are vectors that all have the same length. Because columns are vectors, each column must contain a single type of data (e.g., characters, integers, factors).
- We can open a data viewer window to see the contents of `R`'s `iris` data frame by typing.
- We will be working with spreadsheets a lot.

```{r, eval=FALSE, echo=TRUE}
View(iris)
```


## Create a data frame

Data frame with Harry Potter characters

```{r}
name <- c("Harry", "Ron", "Hermione", "Hagrid", "Voldemort")
height <- c(176, 175, 167, 230, 180)
gpa <- c(3.4, 2.8, 4.0, 2.2, 3.4)
df_students <- data.frame(name, height, gpa)
df_students
```

---

Alternative way of creating DF

```{r}
	df_students <- data.frame(name = c("Harry", "Ron", "Hermione", "Hagrid", "Voldemort"),
				  height = c(176, 175, 167, 230, 180),
				  gpa = c(3.4, 2.8, 4.0, 2.2, 3.4))
	df_students
```


## Adding variable

```{r, echo = TRUE}
	df_students$good <- c(1, 1, 1, 1, 0)
	df_students
```


## Features of the DF

```{r, eval=FALSE, echo=TRUE}
	dim(df_students)
	df_students[2, 3]               #Ron's GPA
	df_students$gpa[2]              #Ron's GPA
	df_students[5, ]                #get row 5
	df_students[3:5, ]              #get rows 3-5
	df_students[, 2]                #get column 2 (height)
	df_students$height              #get column 2 (height)
	df_students[, 1:3]              #get columns 1-3
	df_students[4, 2] <- 255        #reassign Hagrid's height
	df_students$height[4] <- 255    #same thing as above
	df_students
```


## Exercise

Now that you are equipped with the basic, go ahead and take the following Datacamp Course, [R Intro on Datacamp](https://campus.datacamp.com/courses/free-introduction-to-r). Your invitations should now be in your inbox.


# Part IV: Working directories

You can use the
```{r, eval = FALSE, echo=TRUE}
getwd()
```
command to obtain the current directory `R` is using.

It is good practice to set the working directory location to where the files and data are stored.

- Consider setting your working directory to a folder called AAEC4984, AAEC5484, or STAT5484 on your desktop (for example).

## Creating Directory and Set working directory

**Windows**
```{r, eval = FALSE, echo=TRUE}
  setwd("C:/users/[your user name]/Desktop/AAEC4984/")
  # OR
  setwd("C:\\users\\[your user name]\\Desktop\\AAEC4984\\")
  # notice the double backslashes
```

**Mac**
```{r, eval = FALSE, echo=TRUE}
  setwd("~/Desktop/AAEC4984")
```

- To check whether the wd is correct, we again use

```{r, eval = FALSE, echo=TRUE}
getwd()
```

- To obtain a list of the names of files or folders in the working directory, we can use

```{r, eval = FALSE, echo=TRUE}
dir()
```

- To create a new folder in your directory we can use
```{r, eval = FALSE}
dir.create("[Folder name]")
```

## Importing data

`R` allows us to import several file types. I will discuss 3 that we are most likely to use in this course.

1. Text files
: Data sometimes come with headers (the first row is variable names, not actual data!) You need to tell R that!

```{r, eval= FALSE, echo=TRUE}
textdata<-read.table("examples/hogsdata.txt",header=T)
```

2. CSV files
:
```{r,eval=FALSE, echo = FALSE}
csvdata <- read.csv("examples/hogsdata.csv",header=T)
```

3. xlsx files (requires openxlsx package)
```{r, eval= FALSE, echo=TRUE}
xlsxdata <- read.csv("examples/hogsdata.xlsx", ... )
```


## Functions & Packages

Functions are “canned scripts” that automate more complicated sets of commands including operations assignments, etc. For the purpose of this course, we will use a lot of functions that are built both in base R (that is, they are predfined) or available through R packages (discuss below).

A function usually takes one or more inputs called *arguments*, and often (but not always) return a *value*.

---

Consider for example, taking the average of a set of random numbers (x).

```{r, results='markup', echo=TRUE}
set.seed(124)
x <- rnorm(6) * 100
(round(x, digits=2)) # round function => 2dp
```

If we were to do this manually, we would:

1. Sum up the values
```{r, echo=TRUE}
sumx <- sum(x)
```

2. Get the number of observations
```{r, echo=TRUE}
nx <- length(x)
```

3. Divide sum by total number of observations

```{r, echo=TRUE}
meanx <- sumx/nx
```

Using `R`'s built in `mean` function we can do all three steps internally and cross check against our manual calculations.

```{r, echo=TRUE}
mean(x)
meanx == mean(x) # cross validation
```

## Installing Packages

Since `R` is an Open Source software program, thousands of people contribute to the software. They do this by writing commands (called functions) to make a particular analysis easier, or to make a graphic prettier.

When you download `R`, you get access to a lot of functions that we will use. However the other user-written packages we use for our analyses will make our lives much easier.

For example, though we can use the `plot` command for standard graphics, you will quickly see that we can get much better looking time series graphs using the `fpp3` package (which also uses, among other packages,`ggplot2`).


## Installing Packages

To install the `fpp2` package, we can use the command
```{r, eval=FALSE, echo=TRUE}
install.packages("fpp3")
```
We will need to install a package only once in `R`.

Now that you have the `fpp3` package installed, we can check to see if it is in use
```{r eval=FALSE, echo=TRUE}
search()
```

Lastly, in order to use the package, we will need to load the library
```{r eval=TRUE, message= FALSE, echo=TRUE}
library(fpp3)
```

## Using libraries

The `fpp3` package contains a number of useful datasets. One such data set is `us_gasoline` (Weekly US finished motor gasoline product supplied).

Use the `help()` function to get a description of this data. Try
```{r,eval= TRUE, echo=TRUE}
help(us_gasoline)
```

Now let us create a nice plot of the `us_gasoline` data. 

```{r , fig.height= 2.5, echo=TRUE}
autoplot(us_gasoline, col = "darkgreen") +
  #Now to add axis labels
  labs(title = "Weekly US finished motor gasoline product supplied",
       y = "Mbd", caption = "Source: fpp3 package") + 
  theme_light() # One of many themes in R
```

---

Let us leave it there for now!

---





2.1 Comments
Whenever possible, use comments! Anything following the symbol
#in an R Script will not be run in R.Comments are notes we leave ourselves so we know:
I promise that this will become useful when you come back to your code after an extended time. I cannot tell you the number of times I have had a moment of pure genius while coding and I spend hours on a different day trying to understand why I coded it like that or what I actually did.
For example, below is the type of comments that I always include in my programs
You can also understand the following code without even knowing what exactly each line of command does because I tell you what they are!